reppen & soner overlearning
Bias-Variance Trade-off and Overlearning in Dynamic Decision Problems
Reppen, A. Max, Soner, H. Mete
Recent advances in training of neural networks make high-dimensional numerical studies feasible for decision problems in uncertain environments. Although reinforcement learning has been widely used in optimal control for several decades [6], only recently Han and E [18], Han et al. [20] combine it with Monte Carlo type regression for the off-line construction of optimal feedback actions. In these problems, the randomness and the state are observable and a training set based on historical or simulated data is readily available. One then approximates the objective functions of these problems by the empirical averages over this training data, constructing a loss function which is minimized over the network parameters. The minimizer or a near-minimizer is the trained network and it is an approximation of the optimal feedback action.